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Abstract: 

This  paper  proposes  a  new  technique  for  solving  near  neighbor  problems  in  the  plane.  We 
illustrate  our  method  on  the  following  two  problems: 

1.  k-Nearest  Neighbor:  Given  a  set  S  of  n  points  in  the  plane  and  a  query  of  the  form  (q,  k),  with 
q  a  query  point  and  k  a  positive  integer,  report  the  k  points  of  S  closest  to  q. 

2.  Circular  Range  Search:  Given  a  set  S  of  n  points  in  the  plane  and  a  query  of  the  form  ( q ,  d), 
with  q  a  query  point  and  d  a  positive  real  number,  report  all  the  points  of  5  that  lie  inside  the 
circle  at  radius  d,  centered  at  q. 

Our  main  results  include  0(r»1+‘)  space,  0(k  +  log  n)  query  time  algorithms  for  solving  each 
of  these  problems;  k  denotes  the  sise  of  the  output.  We  also  show  that  it  is  possible  to  solve  either 
problem  in  0(k  log3  n)  query  time,  using  only  0(n  log  n)  space.  These  results  constitute  significant 
improvements  over  previous  methods,  in  particular  regarding  the  circular  range  search  problem, 
which  had  previously  defied  efficient  solutions. 
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1.  Introduction 


The  abundant  literature  on  near-neighbor  problems  [1-5,8,11,12]  witnesses  the  central  location 
that  the  notion  of  proximity  occupies  in  computational  geometry.  Among  the  most  powerful  tools 
available  today  for  dealing  with  proximity  relationships,  the  well-known  Voronoi  diagram  stands 
out  as  one  of  the  most  versatile  as  well.  Even  its  failure  to  solve  farthest  point  or  ^-closest  point 
problems  can  be  easily  remedied  by  introducing  the  notion  of  higher  order  Voronoi  diagram  [11]. 
With  this  construction  in  hand,  it  is  possible  to  solve  a  number  o  near-neighbor  problems,  two  of 
which  will  be  of  special  interest  to  us  in  this  paper: 


1.  k-Nearest  Neighbor.  Given  a  set  5  of  n  points  in  the  plane  and  a  query  of  the  form  (q,  Jfc),  with 
q  a  query  point  and  k  a  positive  integer,  report  the  k  points  of  S  closest  to  q. 

2.  Circular  Range  Search:  Given  a  set  S  of  n  points  in  the  plane  and  a  query  of  the  form  {q,  d), 
with  q  a  query  test  point  and  i  a  positive  real  number,  report  all  the  points  of  S  that  lie  inside 
the  circle  of  radius  d,  centered  at  q. 

Every  algorithm  for  a  search  problem  to  be  solved  in  a  repetitive  mode  over  a  fixed  database 
of  n  items  is  characterised  by  three  complexity  measures  — M(n),  Q[n),  and  P(n) —  which  are, 
respectively,  the  storage,  the  response  time,  and  the  preprocessing  time.  For  obvious  reasons,  the 
time  spent  organising  the  data  structure  (preprocessing  time)  is  not  as  important  a  cost  measure  as 
M{n),  so  we  shall  temporarily  ignore  P(n)  until  the  last  section  of  this  paper. 

Previous  work  on  the  k-neareit  neighbor  problem  includes  a  number  of  algorithms  based  on  the 
Voronoi  diagram,  the  most  efficient  of  which  have  the  following  characteristics  [4,6]:  M(n)  =  0(n3) 
and  Q(n)  =  0(k  +  logn).  Other  (less  efficient)  methods  were  found  earlier  [5,8,11].  The  circular 
range  eearch  problem  is  also  amenable  to  efficient  treatment  bj  means  of  Voronoi  diagrams.  A 
straighforward  extension  of  an  algorithm  given  in  [1]  led  to  the  best  method  known  until  now  [4];  the 
method  allows  us  to  report  the  k  points  within  the  query  circle  in  time  Q(n)  =  0(*+ logn  log  logn) 
and  requires  M(n)  =  0(ns)  storage.  Although  we  are  mainly  concerned  in  this  work  with  algorithms 
that  achieve  optimal  (or  near-optimal)  query  times,  we  should  still  mention  the  existence  of  a  space- 
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optimal  algorithm,  with  the  following  characteristics  [12]:  M(n)  =  O(n)  and  Q(n)  =  0(k  +  n°  ”). 
For  other  methods,  consult  [1-3]. 

The  purpose  of  this  paper  is  to  describe  a  novel  representation  scheme  for  higher  Voronoi 
diagrams  that  allows  us  to  improve  on  the  best  algorithms  previously  known  for  the  problems 
mentioned  above.  The  following  table  summarises  our  main  results;  k  designates  the  output  sise. 


problem 

space 

query  time 

k-nearest  neighbor  (fixed  *) 

0(k(n  —  k)) 

0(k  +  log  n) 

k-nearest  neighbor 

C^n1*') 

0(k  -+•  log  n) 

k-nearest  neighbor 

0(n  logn) 

0(k  log2  n) 

circular  range  search 

Ofa^*) 

0(k  -f-  log  n) 

circular  range  search 

O(nlogn) 

0(k  log2  n) 

2.  Some  Background 

Let’s  briefly  review  the  main  steps  of  the  algorithms  proposed  in  [4]  for  solving  the  k-nearest 
neighbor  and  the  circular  range  tearck  problems.  In  the  following,  ALGNn  (resp.  ALGC«)  will 
denote  the  algorithm  of  [4]  for  the  former  (resp.  latter)  problem.  Let  Vort(S)  denote  the  order-* 
Voronoi  diagram  of  a  set  5  of  n  points  in  the  plane.  Recall  that  Vor*(5)  is  a  subdivision  of  the 
plane  into  convex  regions,  all  of  whose  points  have  the  same  k  nearest  neighbors.  More  precisely, 

Vor*(S)  =  (J  Vm(T), 
rc5;|r|-* 

where  V*(T)  is  the  locus  of  points  that  are  closer  to  any  point  in  T  than  any  point  in  5  —  T.  The 
complexity  of  higher-order  Voronoi  diagrams  has  been  thoroughly  analysed  by  Lee  [8],  who  showed 
that  the  sise  of  Vor*(S)  (e.g.  the  number  of  edges)  is  always  0{k(n  —  *)),  a  fact  which  we  express 
by  the  relation 


|Vor*(5)|  =  0(k(n  —  *)). 


(D 


Both  ALGnn  ALGcr  involve  computing  the  set  of  diagrams  {Vorai(S)  |  0  <  s'  <  [log2  nj}. 
In  order  to  efficiently  retrieve  the  neighbors  associated  with  each  Voronoi  polygon,  we  augment  the 
graph  representation  of  each  diagram  Vor*(S)  with  its  neighbor-lists,  i.e.,  the  set  of  k  neighbors 
corresponding  to  each  face.  From  Lee's  findings,  it  then  follows  that  0(n*)  is  an  upper  bound  on 
the  storage  required  by  both  ALGsn  Mid  ALG  cr 


Both  algorithms  were  presented  in  [4]  in  order  to  illustrate  the  concept  of  filtering  search.  This 
notion  prescribes  to  trade  traditional  searching  techniques  for  a  two-step  approach:  scoop-and- fitter. 
The  idea  is  to  collect  (scoop)  a  set  of  0(k)  points  that  is  guaranteed  to  include  the  k  desired  ones  and, 
in  a  second  stage,  fitter  out  the  extraneous  items.  In  the  case  of,  say,  ALGnn,  this  idea  materializes 
as  follows:  first,  compute  the  integer  j  such  that  2*~ 1  <  k  <  2;;  then  determine  the  face  /  of 
Vor2*(5)  that  contains  the  point  q;  finally,  retrieve  from  the  list  of  neighbors  associated  with  /, 
the  k  points  closest  to  q.  Using  an  optimal  planar  point  location  algorithm  [7,10]  to  locate  q  and 
a  linear-selection  algorithm  to  retrieve  the  k  neighbors,  ALGsn  can  be  easily  shown  to  have  the 
following  performance:  M(n)  =  0(ns)  and  Q(n)  —  0(k  -f-  logn). 

ALG  cr  proceeds  along  similar  lines.  The  main  idea  was  originally  proposed  by  Bentley  and 
Maurer  in  [1],  It  involves  locating  q  in  Vora«(S)  for  s'  =  0, 1, 2, . . . ,  and  examining  the  corresponding 
neighbor-list,  proceeding  in  this  manner  until  we  first  encounter  a  point  of  5  that  lies  further  than 
4  from  q.  At  this  stage,  we  know  that  the  desired  points  all  lie  in  the  neighbor-lists  examined  so 
far,  and  only  the  last  one  may  contain  undesired  points.  A  simple  analysis  shows  that  ALGcr’s 
performance  is  given  by  M(n)  —  0(ns)  and  Q(n)  =  0(k  +  logn  log  logn). 

We  will  show  how  to  apply  the  notion  of  filtering  search  to  make  the  representation  of  neighbor- 
lists  more  economical,  and  thus  improve  on  the  above  results.  The  basic  idea  is  to  partition  Vor*(S) 
into  sets  of  adjacent  faces  to  which  are  attached  augmented  neighbor-lists.  This  allows  us  to  save 
a  factor  of  n  in  the  space  requirements  of  ALGnn  Mid  ALGcr-  Further  application  of  filtering 
search  will  lead  to  still  greater  improvements. 


S.  A  New  Representation  of  Higher  Order  Voronol  Diagrams 
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The  purpose  of  this  section  is  to  show  how  neighbor-lists  can  be  stored  implicitly  without 
increasing  the  asymptotic  complexity  of  Vor*(S),  i.e.,  using  only  0(k{n  —  k))  storage.  We  define 
Del*(5)  as  the  dual  graph  of  Vor*(5),  where  vertices  of  the  former  and  faces  of  the  latter  are  in 
one-to-one  correspondence,  and  adjacent  vertices  in  Del*(S)  correspond  to  adjacent  faces  in  Vor*(S) 
(faces  are  adjacent  if  they  share  an  edge  and  not  just  a  vertex).  Note  that  if  no  four  points  in  S 
are  co-circular,  the  vertices  of  vor*(S)  have  degree  three,  therefore  Del* (5)  is  a  triangulation.  At 
any  rate,  the  dual  graph  is  always  connected,  which  makes  it  possible  to  define  a  spanning  tree  of 
Deljk(S),  denoted  7*.  For  the  sake  of  simplicity,  we  transform  7*  into  a  binary  tree  T*  (i.e.,  a  tree 
with  all  degrees  at  most  three),  by  reducing  high  degrees  if  necessary.  To  do  so,  assume  that  v  is 
a  vertex  of  7*  of  degree  m  >  3  and  let  t»i, . . . ,  vm  be  its  adjacent  vertices  in  clockwise  order.  We 
replace  v  by  m  —  2  vertices  »i, . . .,  «/m— a,  defined  as  follows:  u>i  is  adjacent  to  t/j,  t/a,  *"2  and  u>m_ 2 
to  wm— s,  om— 1,  vml  each  other  vertex  to,-  is  adjacent  to  t»l+1,  w,+i. 

It  is  easy  to  see  that  this  transformation  of  7*  can  at  most  double  the  original  number  of  vertices: 
indeed,  let  P  denote  the  number  of  vertices  of  Del*(5)  and  let  i/,-  be  the  number  of  vertices  of  degree 
i  in  7*.  If  S  is  the  maximum  degree  in  7*.  we  have  £i<«s  Vi  —  P  and  £1<t<J  ivi  —  2 (P  —  1) 
(since  7*  is  a  tree).  Let  |7^j  denote  the  number  of  vertices  of  7*.  Since  each  vertex  of  7*  of  degree 

i  >  3  gives  way  to  i  —  2  vertices  in  T* ,  the  sise  of  T*  is  given  by 

|T*|  =  1/1  +  1^+  53  M*'  —  2)  as  53  iVi  —  Vi  —  2  53  Vi<2P  —  2.  (2) 

S£«^S  l£i£«  3<  •<  S 

The  following  is  common  knowledge;  thus  given  without  proof. 

Lemma:  Let  7  be  a  binary  tree  with  m  vertices.  It  is  possible  to  find,  in  O(m)  operations,  an  edge 
of  7  whose  removal  leaves  two  (connected)  subtrees  T’W  and  T^2\  with  at  most  2m/3  vertices  each. 

The  decomposition  process  embodied  by  the  lemma  can  be  applied  recursively  on  the  tree  7’* , 
until  we  achieve  a  decomposition  of  the  original  tree  into  connected  subtrees  7j, . . . ,  T*q,  whose 
numbers  of  vertices  are  all  comprised  between  k  and  3k.  This  is  always  possible  since  in  each 
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splitting  step  each  component  has  at  least  one  third  as  man;  vertices  as  its  parent.  Note  that  a 
given  face  of  Del*(S)  may  be  represented  as  a  vertex  in  several  of  the  subtrees  T\,...,T*r  We  will, 
however,  allow  only  one  instance  to  be  accessible  in  the  search  process. 

We  now  argue  that  neighbor-lists  within  any  subtree  7*  cannot  differ  by  a  large  amount.  Indeed, 
let  <r(u)  be  the  neighbor-list  of  the  face  of  Vor*(S)  that  corresponds  to  the  vertex  u  of  7*.  Two 
adjacent  vertices  u,v  €  correspond  either  to  the  same  face  of  Vor*(S)  or  to  two  adjacent  faces. 
For  this  reason,  <r(u)  and  o(ti)  differ  in  at  most  one  element.  This  allows  us  to  set  up  an  implicit 
representation  of  neighbor-lists  within  each  subtree  7*.  The  simplest  solution  would  consist  of 
merging  all  the  neighbor-lists  into  a  superset  5,  =  U«gr; o(u)  for  »  =  Since  7*  does 

not  have  more  than  3Jfc  vertices  and  each  neighbor-list  has  exactly  Jfc  elements,  we  have  the  relation 
|S,j  <  4k.  Since  on  the  other  hand,  |T*|  >  k,  we  have  q  <  |7*|/Jt,  from  which  we  readily  derive 
l5‘l  -  9maxl5*l  <  nr  ' 4*  =  417*|;  hence  from  (2) 

£  \Si\<*P-  (3) 

We  complete  the  preprocessing  of  the  planar  graph  Vor*(S)  by  organising  it  for  efficient  planar 
point  location  [7,10],  and  associating  with  each  face  /  the  index  of  the  subtree  7*  containing  the 
dual  vertex  of  f.  As  mentioned  earlier,  this  index  may  not  be  unique;  if  there  are  several  candidates 
we  simply  choose  any  one  of  them.  We  are  now  in  a  position  to  determine  the  k  nearest  neighbors 
of  a  query  point  q  by  locating  the  face  of  Vor*(S)  that  contains  q.  Since  the  corresponding  set  5, 
contains  at  most  4k  elements,  we  can  apply  a  linear- selection  algorithm  to  retrieve  the  k  nearest 
neighbors  of  q  in  O(Jfc)  time. 

It  is  possible  to  circumvent  the  difficulties  inherent  to  linear-selection  methods  by  slightly  refining 
the  representation  of  S<.  Choose  any  vertex  v"  in  T*  (»'  =  1, . . . ,  q)  and  call  it  the  root.  The  set  o(t>‘) 
is  represented  in  full  by  means  of  a  linked  list.  Since  o(u)  and  <r(v)  differ  in  at  most  one  element 
when  u  and  v  are  adjacent  vertices  in  7*,  we  can  compute  the  lists  o[v)  incrementally.  Indeed,  if 
<r(u)  is  available,  replacement  of  one  item  in  a  known  position  of  o(u)  enables  us  to  compute  o(t/); 
this  is  done  by  simply  specifying  the  position  in  o(u)  and  the  item  to  be  inserted  as  replacement. 


Thus,  assuming  that  we  hare  a  linked  list  representation  of  7\  ,  suppose  that  we  wish  to  compute 
where  t»  is  an  arbitrary  vertex  of  T*.  We  identify  the  unique  path  in  T*  from  it  to  and 
traverse  it  backward,  at  each  step  updating  the  current  point  list.  This  list  will  be  precisely  a(v)  at 
the  termination  of  the  traversal.  Once  again  the  report  time  will  be  0(k),  since  |7*|  <  3k.  This 
scheme  necessitates  the  storing  of  ff(t/*)  and  a  fixed  amount  e  of  data  per  edge  in  T*.  Thus  the 
added  storage  C<  associated  with  T*  is 

Ci  =  M»*)l  +  e(ir*|  -  1)  <  *  +  e|T-|. 

Therefore  the  total  storage  can  be  bounded  from  above  as  follows 

£  Ci<qk  +  c  £  |T*|  <  !Li  •  *  +  e|T*|  =  (e  +  1)|T*|  <  2(c  +  1)P, 

where  use  has  been  made  of  q  <  \T*\/k  and  of  Relation  (2). 

Whatever  the  strategy  chosen,  Relation  3  and  the  inequality  above  show  that  the  total  amount 
of  storage  needed  is  0(F).  Since,  as  in  any  planar  graph,  the  number  of  faces  in  Vor*(S),  P,  is 
dominated  by  the  number  of  edges,  Relation  1  shows  that  the  overall  storage  used  by  the  algorithm 
is  0(k(n  -  *)). 

Theorem  1.  When  the  value  of  k  is  fixed  (i.e.,  k  is  not  part  of  the  query),  it  is  possible  to  solve  the 
k-nearest  neighbor  problem  in  0(k  +  log  n)  time,  using  0(Jfc(n  —  It))  space. 

Throughout  this  paper  we  will  use  the  notation  Vor^(S)  to  designate  the  data  structure  used  in 
Theorem  1,  i.e.,  the  order- It  Voronoi  diagram  of  S,  augmented  with  the  implicit  representation  of 
neighbor-lists  and  preprocessed  for  efficient  planar  point  location. 

4.  The  k-Nearert  Neighbor  Problem 

Algorithm  ALGnn  can  be  used  exactly  as  described  in  Section  2;  the  only  difference  coming 
flrom  the  new  representation  of  higher-order  Voronoi  diagrams  now  used.  Theorem  1  shows  that  the 


overall  storage  needed  is 


°(  £  2*(n  —  2*))  =  0(n2). 

OSi^Llof,  nj 

The  query  time  is  clearly  0(2J  -f  log  n),  with  2J— 1  <  k  <  2J,  hence  0{k  +  logn).  This  shows  that 
it  is  possible  to  solve  the  k-nearest  neighbor  problem  in  0(k  -j-  logn)  time,  using  0(n2)  space.  As 
will  be  established  in  the  following,  this  improvement  can  be  taken  much  further  with  a  more  subtle 
use  of  filtering  search. 

4.1.  Outline  of  the  approach 

We  begin  with  an  informal  description  of  the  basic  idea  of  the  approach.  In  the  data  structure 
Vor*(5)  we  call  the  parameter  j  the  scope  of  the  search.  In  the  previously  outlined  ALGNN  the 
storage  cost  is  mainly  due  to  the  necessity  of  having  a  Voronoi  diagram  with  the  adequate  search 
scope  over  the  entire  set  5  for  any  query:  this  is  because  we  stipulated  to  carry  out  the  filtering  search 
as  a  single  stage  process.  If  instead  we  explore  the  idea  of  a  multistage  search,  at  each  stage  we 
could  use  the  information  so  far  acquired  to  devise  the  best  strategy  for  the  next  stage.  Specifically, 
suppose  that  in  the  course  of  the  process  we  have  partitioned  the  original  set  5  into  several  nontrivial 
subsets,  and  — as  a  policy —  we  explore  the  “most  promising”  subset  with  an  increasing  search  scope. 
Such  a  scheme  would  have  the  property  that,  while  the  search  scope  increases,  the  size  of  the  searched 
set  decreases,  which  bears  the  promise  of  reduced  storage  requirement. 

More  formally,  the  main  ( primary )  search  structure  is  appropriately  described  as  a  rooted  tree 
7.  A  search,  prompted  by  the  query  ( q ,  it),  is  to  be  viewed  as  the  visit  of  a  subtree  Tq  of  7,  where 
the  term  "subtree”  refers  here  to  any  connected  subgraph  of  7  that  contains  the  root.  Associated 
with  each  node  v  of  7  there  is  a  set  5(v)  C  S,  and  a  (secondary)  search  data  structure  Vor^„)(5(v)), 
where  k(v)  is  the  search  scope  at  node  v.  We  also  define:  T(v)  is  the  set  of  offspring  nodes  of  v, 
h(v),  the  depth  of  v,  is  the  number  of  ancestors  of  v  in  7;  level  j  of  7  is  the  set  of  all  nodes  v  with 

M»)  = 

If  r(»)  =  {«!,...,«>*}  (c  >  2),  then  we  stipulate  that  {S(*/i), . . .,S(tt>e)}  is  a  non-trivial 
partition  of  5(v).  Thus  is  we  set  5(root)  =  S,  it  is  immediate  to  recognize  that  the  set  of  nodes  on 


a  given  level  of  T  define  a  partition  of  o>  .  and  that  the  level-(/  +  1)  partition  is  a  proper  refinement 
of  the  level-/  partition  (j  >  0). 

We  new  restrict  the  structure  of  T  by  imposing  the  following  regularity  constraints: 

1.  All  internal  nodes  of  T  have  the  same  degree  c  >  1; 

2.  The  subsets  of  S( v),  assigned  to  the  offsprings  of  v,  are  (nearly)  equally  sized  (this  is  trivial 
when  c  is  equal  to  1  and  is  readily  obtained  by  requiring  n  =  cm  otherwise); 

3.  If  w  is  the  parent  of  v,  then  k(v )  —  f  X  k(v),  where  /  >  1. 

Note  that  we  may  choose  k(root(T))  =  [logj  n J,  since  this  will  equalize  search  time  and  report 
time  for  root(T).  Moreover,  VorJ(ej(S(u))  is  replaced  by  S(v)  itself  if  |5(v)|  <  k(v);  thus,  if  t;  is  a 
leaf  of  T  and  to  is  its  parent,  we  wish  to  have  |S(t/)|  <  k(v)  and  |5(tn)|  >  k(w).  This  characterizes 
the  leaves,  and  the  depth  h  of  T  is  the  smallest  integer  satisfying  the  following  inequality: 

|S(t>)|  =  n/ch  <  [logj  nj  •  }h  =  k(v) 

or  equivalently  ( }c)h  >  n/[log2  nj.  We  can  thus  express  h  as  follows: 

log  n  —  log[log2  nj 
Jog (fc) 

The  storage  requirement  of  T  is  readily  evaluated.  The  data  structure  Vor^0j(5(t»))  is  stored 
in  0(k(v)  ■  |S(u)|)  space  (see  Section  3);  since  k[v)  is  constant  for  all  nodes  v  at  the  same  depth,  the 
storage  requirement  for  all  nodes  at  level  h(v)  of  T  is  0(k{v)  £|S(v)|)  =  0{nk(v)).  Recalling  that 
k{v)  =  [loga  nj  •  we  derive  that  the  total  storage  requirement  M(n)  of  T  is 

M(n)=0(nlogn  £  /')  =  0{nfK  log  n).  (5) 

Substituting  (4)  into  (5),  we  obtain 

JVf(n)  =  0(n1+T^f^  •  (loga  (6) 

Therefore,  given  any  £  >  0,  if  we  choose  /  and  e  so  that  =  £,  we  can  achieve  storage 

M(n)  =  0{n1+t). 
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4.2.  Answering  a  Query 


The  subtree  Tq  of  T  which  describes  the  search  prompted  by  a  given  query  ( q ,  k)  is  grown  one 
node  at  a  time,  starting  at  the  root  r.  The  primitive  operation  used  by  the  search  process  is  the 
visit  of  a  node  v,  and  consists  of  the  following  actions: 

Step  Is  Locate  q  in  Vor^„)(5(t/))  and  report  the  set  Nq(v)  of  the  k(t/)  closest  neighbors  to  q  in  S(v) 
(Nq(v)  is  the  set  retrieved  at  v.) 

Step  2:  Determine  the  member  of  Nq(v),  called  far(v),  which  is  farthest  from  q. 

If  T*  denotes  the  subtree  of  T  visited  so  far  by  the  search  process,  then 

U  *w 

oeleaves  of  r* 

denotes  the  total  retrieved  set. 

The  process  makes  use  of  a  priority  queue  R  that  contains  a  subset  of  the  nodes  of  T.  The 
ordering  in  R  is  based  on  the  value  of  the  distance  of  far  from  q,  and  the  top  of  R  (the  node  to  be 
extracted)  is  the  node  in  R  for  which  this  .distance  is  minimum.  Queue  R  is  managed  as  follows: 
initially,  the  root  r  of  T  is  visited  and  inserted  into  the  queue.  At  the  generic  step,  the  node  v  at  the 
top  of  R  is  extracted  and,  if  it  is  not  a  leaf  of  T,  ail  of  its  offsprings  are  visited  and  inserted  into  the 
queue.  By  this  process,  it  is  immediately  recognized  that  the  nodes  in  R  are  leaves  of  T* ,  all  nodes 
of  which  have  been  visited  by  the  search.  Note,  however,  that  not  all  leaves  of  7'*  are  necessarily  in 
R ;  indeed,  a  leaf  of  T*  that  is  also  a  leaf  of  T,  upon  leaving  R,  does  not  generate  any  new  leaves 
to  be  re-inserted.  Each  visited  node  is  marked  and  the  process  terminates  when  the  set  of  marked 
nodes  is  the  node  set  of  Tq  (i.e.,  when  T*  =  Tq).  We  shall  now  develop  the  criterion  for  termination, 
which  is  based  on  the  claim  that  when  the  total  retrieved  set  is  large  enough  it  is  guaranteed  to 
contain  the  desired  k  nearest  neighbors  of  q. 

The  C  be  the  disc  centered  at  q  that  passes  through  the  Jfc-th  nearest  neighbor  of  q.  Without 
loss  of  generality,  we  may  assume  that  only  one  point  lies  on  the  boundary  of  C,  so  as  to  ensure 
that  the  notion  of  "k  nearest  neighbors”  is  well  defined.  A  node  v  is  said  to  be  saturated  if  fariv)  lies 


inside  C,  and  uruaturated  otherwise.  For  each  saturated  node  v,  New(o)  denotes  the  set  of  points 
that  are  discoTered  for  the  fint  time  while  visiting  v:  New(o)  represents  the  contribution  of  v  to  the 
k  nearest  neighbors  of  q,  and  |New(v)|  is,  in  bookkeping  terms,  the  net  revenue  for  a  cost  of  k(v) 
retrieved  points.  We  claim  that  |New(t>)|  >  (1  —  l/f)k(v).  Indeed,  any  point  p  £  Nq(v)  previously 
discovered  must  have  been  encountered  for  the  first  time  while  visiting  an  ancestor  f)  of  v  and  must 
also  belong  to  Af4(father(v)).  Thus  the  set  of  newly  discovered  points  is 

New(o)  =  N,(v)  -  S(t>)f|  Ar,(father(t>)), 

and  its  cardinality  is  |1V,(i;)|— |5(t/)  f)  N?(father(v))|.  Since  N,( v)  =  k(v)  and  |S(t/)  H  fV,(father(t/))|  < 
|iV9(father(v))|  =  k(v)/f,  at  least  k(v)  —  k(v)/f  points  are  discovered  for  the  first  time  at  v,  as 
claimed.  Thus 

ijv,(«oi  <  j4r|New(,/)|-  (7) 

We  now  observe  that  an  unsaturated  node  u  will  not  be  extracted  from  the  queue  R  as  long  as 
R  contains  at  least  one  saturated  node  v;  indeed,  the  distance  from  q  to  farfu)  is  dominated  by  the 
distance  from  q  to  far[v),  so  that  the  presence  of  v  prevents  u  from  appearing  at  the  top  of  R.  This 
implies  that  the  visit  of  T,  is  completed  when  the  last  saturated  node  v'  leaves  R.  Indeed,  after 
v*  has  been  extracted,  R  contains  only  unsaturated  nodes;  for  any  such  node  u,  Nq{u)  contains  at 
least  one  point  of  S  outside  C,  which  implies  that  ail  the  k  nearest  neighbors  of  q  have  already  been 
retrieved  (i.e.,  they  belong  to  the  set  U,gleaves(r*)  )>  *°  Tq  has  been  visited  entirely. 

Let  us  now  return  to  consider  an  unsaturated  node  u,  offspring  of  a  saturated  node  tu  After 
the  extraction  from  R  of  the  (saturated)  w,  the  visits  of  its  offspring  (all  inserted  into  R)  have  been 
done  at  a  cost  of  e|IV4(u)|  =  c/|iVq(u»)|  retrieved  points.  We  stipulate  to  charge  the  cost  of  the 
visit  of  each  unsaturated  offspring  u  to  its  saturated  parent  to;  in  the  worst  case,  all  offsprings  are 
unsaturated,  so  that  w  gets  charged  the  additional  cost  e  •  |AT?(u)|  =  cf\Nq(w)\  <  j^j\New(w)\. 

Consider  now  the  event  consisting  of  the  extraction  of  the  latt  saturated  node  from  R.  At  this 
point,  12 saturated!)  |New(v)j  <  k,  since  all  the  k  nearest  neighbors  of  q  belong  to  the  retrieved  set. 


This  event  can  be  detected  by  controlling  a  cumulative  sum 


E  *(•)• 

visited* 

Indeed,  let  V  denote  the  set  of  nodes  visited;  this  set  is  partitioned  into  Vi ,  the  subset  of  saturated 
nodes,  and  Vj,  the  subset  of  unsaturated  nodes. 


a  =  E  k(v)  =  53  \Nq(v)\  =  53  uvf(«)i  +  53  ijv,(u)|. 

*€V  *€V  *£V\  «6V, 

For  a  saturated  v,  we  have  |JV4(v)|  <  jXT|New(ii)|  (Relation  7),  so  that; 

E  W«MI  <  7ZTT  E  lNewM- 

•eVi  *  .6vt 

On  the  other  hand,  the  cost  of  retrieving  Nq{u)  for  each  unsaturated  node  u  is  charged  to  their 
saturated  parents,  as  discussed  earlier.  So,  let  now  Vj  denote  the  subset  of  V\  with  unsaturated 
offspring.  We  have 


53  |W,(»)|  <  ~r~T  E  |N*"(*)I  <  7 eAr  E  lN«M«)l, 

*ev,  '  1  *€V,  '  1  *ev, 


and,  in  conclusion: 


*  <  E  IN"MI  < 


(l  +  </)/ 1 
/-i 


Since  is  the  maximum  value  that  A  can  assume  as  the  last  saturated  node  departs  from 

R,  the  condition 


A  > 


(1  +  ef)J 
/-I 


k  =  Ci  k 


can  be  used  to  detect  the  termination  of  the  scooping  phase  of  the  filtering  search. 


We  can  now  describe  the  search  algorithm.  To  provide  for  efficient  updating,  the  priority  queue 
R  will  be  a  dynamic  heap  (e.g.  2-3  tree).  This  will  allow  us  to  retrieve  the  top  of  the  queue,  as  well 
as  perform  insertions  and  deletions,  all  in  logarithmic  time.  For  convenience,  we  should  keep  in  the 
queue  pairs  of  the  form  {v,far{ v)).  We  have: 


Initial  Step: 


If  k  <  loga  n,  then  visit  the  root  r  and  halt;  else  let  A  :=  0,  T*  :=  0,  and  insert  (r,far(r))  into 

R. 

General  Step:  Let  v  be  the  node  at  the  top  of  R  and  let  A  =  £ve7~  k{v). 

1.  Assume  that  A  <  cik.  If  ti  is  a  leaf  of  T,  delete  v  from  R  and  iterate.  Otherwise,  delete  v 
from  R,  and  visit  each  child  z  of  v,  and  insert  (z,/ar(z))  into  both  R  and  T*.  Update  the  value  of 
A  and  iterate.  Updating  A  involves  adding  ck(z)  to  its  current  value  (where  again  z  is  the  generic 
child  of  v). 

2.  If  A  >  cjJk,  apply  a  linear-selection  algorithm  to  the  set  U«,eleaves(r*>  and  determine 

the  k  points  closest  to  q.  These  points  constitute  the  k  nearest  neighbors  of  q  in  S. 

With  the  previous  analysis  at  hand,  evaluating  the  query  time  of  the  algorithm  is  quite  straightfor¬ 
ward.  Processing  node  v  requires  time  0(k(v)  -j-  logn),  which  is  also  0(fc(v)),  since  k{v)  =  fl(logn) 
The  query  time  will  thus  be  0(A),  so  we  can  conclude 

$(n)  =  0(*  -f  log  n). 

Thus,  we  have  derived  the  main  result  of  this  section. 

Theorem  2.  It  is  possible  to  solve  the  ^-nearest  neighbor  problem  in  0(Jt  +  logn)  time,  using  0(n1+<) 
space;  e  is  an  arbitrarily  small  real  number  >  0. 

We  close  this  section  with  two  remarks.  The  first  is  that  the  constant  time  that  multiplies 
Jt  in  the  expression  Q(n)  =  0{k  +  logn)  is  proportional  to  c\  =  (1  +  c/)//(/  —  1).  Since  £  = 
log//log(/e),  for  fixed  /,  we  easily  derive  that  ei  =  OJ/1/*).  Finally,  we  note  that  if  we  let  c  =  1, 
we  obtain  a  scheme  where  the  primary  structure  T  is  a  chain  (and,  consequently,  S(v)  =  S  for  each 
o  6  T).  In  this  case  the  global  storage  — see  (6) —  becomes  0(n3)  and  the  method  behaves  like  the 
one  described  in  Section  3  (save  for  the  replacement  of  bisection  search  with  a  sequential  search). 


4J.  A  More  Spaee-effleient  Solution 


It  is  possible  to  lower  the  space  requirements  of  the  previous  algorithm  at  the  price  of  some 
increase  in  the  query  time.  We  will  present  an  0(n  log  n)  data  structure  that  allows  a  query  to  be 
answered  in  time  0(k  log3  n).  This  method  can  be  of  great  interest  when  the  application  specifically 
requires  that  the  k  neighbors  be  sorted  by  distance  to  q.  As  we  will  see,  another  asset  of  this  scheme 
is  its  utter  simplicity.  Let  T  be  a  complete  binary  tree  defined  over  the  n  points  of  S;  each  leaf 
of  T  corresponds  to  a  distinct  point,  with  the  n  points  appearing  in  ascending  z-order  from  left  to 
right.  Each  node  v  of  T  spans  a  subset  S( u)  of  S  consisting  of  the  points  stored  at  the  leaves  of  the 
subtree  rooted  at  v.  The  preprocessing  involves  computing  the  (order- 1)  Voronoi  diagram  of  each 
subset  S(v );  this  can  be  done  in  O(nlogn)  time  by  using  the  divide- and-conquer  algorithm  of  [11]. 
Each  Voronoi  diagram  will  be  preprocessed  for  efficient  planar  point  location,  using  Kirkpatrick’s 
algorithm  [7].  Aside  from  its  conceptual  simplicity,  this  point  location  method  has  the  advantage 
over  [10]  of  requiring  only  linear  preprocessing,  provided  that  the  edges  of  the  graph  are  already 
ordered  around  each  of  their  adjacent  vertices.  This  is  precisely  the  case  with  the  Voronoi  diagram 
construction  of  [11],  therefore  the  entire  preprocessing  will  take  O(nlogn)  time. 

We  answer  a  query  (q,  k)  by  first  computing  the  nearest  neighbor  of  q  in  S,  using  the  structure 
Vor1(5(r))l  where  r  is  as  usual  the  root  of  the  tree,  and  5(r)  =  5.  Next  we  visit  the  two  offsprings 
of  r  and  proceed  as  in  the  method  described  in  the  preceding  section;  in  the  present  case  the 
priority  queue  R  yields  the  neighbors  of  q  in  order  of  increasing  distance.  There  are  a  few  obvious 
modifications,  suggested  by  the  special  nature  of  the  problem:  let  p  be  the  point  just  extracted  from 
the  top  of  the  queue,  and  let  v  be  the  corresponding  node  in  T.  It  is  easy  to  see  that  p  will  “drag”  the 
computation  all  the  way  down  to  the  leaf,  w,  where  it  is  stored.  Once  this  leaf  has  been  reached,  we 
will  delete  p  from  the  queue  and  iterate.  Note  that  it  is  useless  to  search  the  structures  Vori(5(z)) 
encountered  on  the  path  from  v  to  to,  since  this  will  always  produce  the  same  answer,  i.e.,  p.  Instead, 
we  shall  just  visit  the  siblings  of  the  nodes  on  this  path,  thereby  cutting  to  a  half  the  computational 
search  work.  Thus,  since  the  number  of  nodes  of  T  visited  by  the  search  is  0(k  log  f )  —see  [9]— 
and  each  visit  has  a  cost  of  O(logn),  we  conclude 


Theorem  3.  It  it  possible  to  solve  the  k-neareit  neighbor  problem  in  0(k  log3  n)  time,  using  0(n  log  n) 

spue. 

5.  The  Circular  Range  Search  Problem 

Before  resorting  to  the  fairly  heavy  machinery  of  Section  4  in  order  to  produce  a  near-optimal 
algorithm  for  the  circular  range  search  problem,  we  wish  to  show  how  a  simple  application  of  Theorem 
1  leads  to  a  significant  improvement  over  the  algorithm  ALGcr  of  Section  2. 

5.1.  A  Preliminary  Algorithm 

We  will  describe  an  algorithm  with  performance:  Q(n)  =  0(Jfc  +  logn)  and  M(n)  =  0(n2)  (with 
k  being  the  output  size).  Recall  that  the  basic  idea  behind  ALGcr  is  to  retrieve  the  neighbor-lists 

of  the  regions  of  Vor;(5)  containing  the  query  point  q,  for  j  —  2*;  *  =  0, 1, _  This  process  will 

stop  at  the  first  encounter  with  a  point  further  than  d  from  q.  At  this  stage,  the  current  neighbor-list 
will  be  a  superset  of  the  desired  set,  with  at  most  21:  points,  therefore  a  simple  scan  through  it  will 
complete  the  work.  Since  a  total  of  O(logJt)  neighbor-lists  will  thus  be  examined,  the  query  time 
of  the  algorithm  is  0(Jfc  +  logJfclogn),  which  can  be  easily  shown  to  be  0(it  -+■  log  n  log  log  n).  If  we 
substitute  the  data  structure  of  Theorem  1,  Vor*(S),  for  the  combination  {Vory(S),  neighbor-lists}, 
we  lower  the  storage  requirements  to 

M(n)  =  0(  £  2*n)  =  0(n3). 

We  can  improve  the  query  time  by  slightly  reorganising  the  computation.  Let  r  =  [log2  log3  n]. 
The  first  step  consists  of  retrieving  the  sought  neighbor-list  in  Vorj.(S).  If  it  contains  any  point 
further  than  d  from  q,  we  complete  the  computation  by  filtering  out  the  extraneous  items,  at  a 
total  cost  of  0( V  +  logn)  =  O(logn)  operations.  If  on  the  other  hand  all  the  points  in  the 
list  lie  within  a  distance  d  of  q,  we  must  pursue  the  search  with  larger  scope.  We  return  to  the 
previous  mode  of  operation,  retrieving  the  sought  neighbor-lists  in  Vorj.+^S),  Vorj.H-.(S),  —  Let 


Vorj*(S)  be  the  last  Voronoi  diagram  investigated.  A  total  of  E  —  r  +  1  neighbor-lists  will  have 
been  examined,  therefore  the  algorithm  requires  0[(E  —  r  +  l)logn  -f  2*)  time,  hence 

Q(n)  =  0((E  —  r  +  1)  logn  +  2*).  Since  loga  n  <  2f ,  we  have 

(E  -  r  +  1)  logj  n  +  2s  <(E  —  r+  2*~T  +  l)2r  <  (2*-T+1  +  l)2r  <  2*+a, 

therefore  <J(n)  =  0(2*+a).  The  examination  of  Vora*_i(S)  produces  only  points  within  d  of  q, 
therefore  k  >  2S~1.  This  implies  that  Q(n)  =  0(k ),  and  completes  the  proof  that  in  all  cases 
Q(n)  =  0(*  +  logn). 

5.2.  A  Near-Optimal  Algorithm 

The  preprocessing  is  absolutely  identical  to  the  one  described  in  Section  4.1.  We  construct  the 
tree  T  with  the  data  structure  Vot k^(S(v))  attached  to  each  of  its  nodes  v.  Answering  a  query  can 
be  now  described  recursively  quite  simply:  starting  at  t;  =  root,  retrieve  the  k(v)  nearest  neighbors 
of  the  query  point  q.  If  any  of  these  neighbors  is  found  not  to  lie  within  a  distance  d  of  q,  the  subset 
of  neighbors  that  do  lie  within  d  of  q  can  be  reported  immediately,  and  the  entire  subtree  rooted  at 
v  need  not  be  further  examined.  If  on  the  other  hand  all  the  neighbors  lie  within  d  of  q,  we  must 
pursue  the  exploration  of  T,  and  to  do  so  we  distinguish  between  two  cases:  if  v  is  a  leaf  of  T,  we 
report  all  the  neighbors  just  found,  otherwise  no  reporting  takes  place;  instead,  we  iterate  on  the 
same  process  with  respect  to  each  of  the  c  children  of  v. 

The  algorithm  is  trivially  correct,  so  we  only  need  to  investigate  its  running  time  Q(n).  As 
before,  the  set  of  nodes  examined  in  T  forms  a  subtree  Tq  of  T.  From  Theorem  1,  we  derive  that 
Q{n)  =  0(E„erif  (*(«)  +  log(n))),  and  since  k(v )  >  Lloga  nj, 

Q(n)  =  0(  £  *(v)). 

•€rf 

Let  v  be  any  internal  node  of  Tr  It  follows  directly  from  the  algorithm  that  the  Jfc(v)  neighbors 
examined  at  node  v  all  lie  within  d  of  q.  As  usual,  let  W9(t>)  denote  this  set  of  points.  We  can  repeat 
a  previous  argument  to  show  that  the  £^k(v)  points  of  Nq[v)  are  discovered  at  node  v  for  the  first 


time  (note  that  these  are  the  £^-k(v)  points  furthest  away  from  q  in  Nq(v)).  Since  the  total  time 
required  to  visit  all  the  children  of  v  is  0(ck(v)),  it  can  be  accounted  for  by  the  newly  discovered 
neighbors.  This  scheme  involves  charging  the  cost  incurred  by  each  node  to  its  parent,  except  for 
the  root  of  Tq  that  will  also  bear  its  own  cost,  i.e.,  O(logn)  time.  We  thus  derive  the  relation 

Q(n)  —  0(ek  +  logn).  (15) 

Relations  (6)  and  (15)  allow  us  to  conclude 

Theorem  4.  It  is  possible  to  solve  the  circular  range  tearch  problem  in  0(k  +  log  n)  time,  using 
0(n1+‘)  space;  k  denotes  the  sise  of  the  output  and  e  an  arbitrarily  small  real  number  >  0. 

5.3.  A  More  Space-efficient  Algorithm 

Applying  the  very  same  technique  developed  in  Section  4.3,  yet  discarding  the  priority  queue 
— for  which  we  have  no  use  here —  we  derive  the  following  result. 

Theorem  5.  It  is  possible  to  solve  the  circular  range  tearch  problem  in  0(Jfc  log2  n)  time,  using 
O(nlogn)  space;  k  denotes  the  sise  of  the  output. 

6.  Some  Preprocessing  Time  Considerations 

What  is  the  time  required  to  organize  the  various  data  structures  introduced  in  this  paper? 
In  the  case  of  Theorem  1,  we  can  use  Lee’s  0(A2nlogn)  time  algorithm  to  construct  the  order-1; 
Voronoi  diagram  of  S  [8].  Since  Vor*(S)  has  0(k(n  —  k))  vertices,  the  decomposition  of  its  dual 
into  subtrees  T\, . . . ,  Tq  can  be  carried  out  in  0(k{n  —  k)  log  n)  time  (see  Lemma  in  Section  3).  The 
remaining  preprocessing  can  be  easily  shown  to  require  0(Jk  x  |Vor*(5)|)  operations,  therefore  the 
overall  computation  requires  0(Jfc2nlogn)  time. 


The  algorithms  of  Theorems  2  and  4  involve  the  same  type  of  preprocessing.  Since  several 
Voronoi  diagrams  are  needed,  we  can  use  the  ingenious  representation  of  the  set  {Vori(S), . . . ,  Vor9(S)] 
recently  discovered  by  Edelsbrunner  et  al  [6].  This  scheme  allows  us  to  construct  and  represent  any 
diagram  Vort(S),  along  with  all  its  neighbor-lists,  in  time  0(Jk2(n  —  *)),  after  0(n3)  preprocessing. 
This  shows  that  the  time  required  to  organise  the  data  structure  for  the  algorithm  of  Theorem  2  is 
0(ns)  for  each  level,  i.e.,  0(n*  log  n)  for  the  entire  structure.  It  is  easy  to  see  that  the  same  result 
holds  true  for  the  algorithm  of  Theorem  4. 

Finally,  concerning  Theorems  3  and  5,  recall  that  the  corresponding  data  structures  have  already 
been  shown  to  require  0(n  log  n)  time  for  their  construction. 

7.  Conclusions 

The  contribution  of  this  work  has  been  to  propose  an  economical  method  for  representing 
higher-order  Voronoi  diagrams  and  demonstrate  its  power  by  describing  improved  algorithms  for  a 
number  of  near-neighbor  problems,  in  particular,  for  the  circular  range  search  problem,  which  had 
previously  defied  efficient  solutions.  Several  open  questions  deserve  investigation.  Order-*  Voronoi 
diagrams  are  powerful  tools,  yet  prohibitively  expensive  for  large  values  of  k.  We  partly  overcame 
this  shortcoming  by  using  filtering  search.  This  rescinded  the  need  for  very  high-order  Voronoi 
diagrams,  yet  failed  to  reduce  the  space  requirement  to  0{n  xPOLYLOG(n)).  Whether  this  bound 
can  be  achieved,  as  is  the  case  for  the  range  query  problem  [4],  is  an  interesting  open  question. 

Another  area  worthwhile  of  study  concerns  the  existence  of  efficient  near-neighbor  algorithms 


that  use  only  linear  space.  As  mentioned  earlier,  F.  Yao  provided  a  partial  answer  to  this  question 
by  providing  a  linear  space  algorithm  for  the  circular  range  search  problem  with  sublinear  query 
time  [12].  Can  a  similar  algorithm  be  devised  for  solving  the  k-nearest  neighbor  problem? 
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